Covid19 Japan

Covid19 Japanが独自に収集している陽性者単位のデータ(個票データ)。ソースとデータは全てGitHubにて公開されており、データはJSON形式。「レコード数 \(\neq\) 累計陽性者数」であることに注意。

 

Import

Covid19 JapanGitHubで公開しているデータは前述のようにJSON形式であり、最新データはlatest.jsonファイルで示されている。このため、読み込む際はひと工夫必要。

Patient Data

陽性者単位の個票データ。

path <- "https://raw.githubusercontent.com/reustle/covid19japan-data/master/docs/patient_data/"

df <- path %>% 
  paste0("latest.json") %>% 
  readr::read_lines() %>% 
  paste0(path, .) %>% 
  jsonlite::fromJSON()

df

Summary Data 【参考】

死亡者数や重症者数などの推移データはsummaryフォルダ内のJSON形式ファイルにまとめられている。summaryフォルダの他にsummary_minフォルダというフォルダがあるが、summary_minフォルダ内のJSONファイルは単に改行を省略して小さくしたファイル。

path <- "https://raw.githubusercontent.com/reustle/covid19japan-data/master/docs/summary/"

df_s <- path %>% 
  paste0("latest.json") %>% 
  readr::read_lines() %>% 
  paste0(path, .) %>% 
  jsonlite::fromJSON()

df_s %>% summary()
##             Length Class      Mode     
## prefectures 27     data.frame list     
## regions     12     data.frame list     
## daily       37     data.frame list     
## updated      1     -none-     character

 
要約すると分かるように3つのデータフレーム(都道府県単位、八地方区分単位、日次)と一つのベクトル(更新日時)から構成されている。

 

都道府県単位集計

更新日次時点における都道府県単位での累積値。陽性者・死亡者などの時系列集計データはネストで格納されている。
厚生労働省のオープンデータが集計から除いている空港検疫・ダイヤモンドプリンセス・長崎クルーズ船・その他を含めて全51区分。

df_s$prefectures

 

地方単位集計

更新日次時点における八地方区分単位での累積値。陽性者・死亡者などの時系列集計データは都道府県単位と同様にネストで格納されている。
ただし、確認した時点(2020/11/3)では、時系列集計値の合計と累積値が一致しない。

df_s$regions

 

日次集計

個票データを日次で集計したもの。累積値の他に移動平均も含まれているが、暗黙の欠落を含んだデータである点に注意が必要。

df_s$daily

 

更新日時

の更新日時が記録されている。

df_s$updated
## [1] "2020-11-03T23:51:44+09:00"

 

Area Data

Covid19 Japanのデータは個票データなので、様々な属性が含まれている。これらの属性を利用して地域・地方ごとの分析を行う場合に便利な都道府県データを用意した。このデータはGistで公開している。

 

Others

その他のオープンデータ

新型コロナウイルス対策病床オープンデータ

 

Data Wrangling (tidy and transform)

Summarize

最初にデータがどのようになっているか確認する。これには要約に便利なskimrパッケージを用いる。

df %>% 
  skimr::skim()
Data summary
Name Piped data
Number of rows 104960
Number of columns 23
_______________________
Column type frequency:
character 19
logical 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
patientId 0 1.00 1 8 0 103294 0
dateAnnounced 0 1.00 10 10 0 280 0
gender 15053 0.86 1 1 0 2 0
detectedPrefecture 0 1.00 3 15 0 49 0
patientStatus 100997 0.04 8 23 0 8 0
notes 54352 0.48 1 270 0 47846 1
mhlwPatientNumber 104511 0.00 1 11 0 434 0
prefecturePatientNumber 12887 0.88 5 20 0 92064 0
prefectureSourceURL 73656 0.30 5 224 0 3439 0
residence 22895 0.78 1 38 0 1422 0
sourceURL 637 0.99 1 239 0 7973 0
relatedPatients 94564 0.10 2 259 0 6345 0
knownCluster 102478 0.02 3 88 0 229 0
detectedCityTown 78961 0.25 2 22 0 663 0
cityPrefectureNumber 79226 0.25 1 34 0 25725 2
citySourceURL 93128 0.11 9 317 0 3637 0
deceasedDate 103156 0.02 10 10 0 230 0
deceasedReportedDate 103746 0.01 10 62 0 204 0
deathSourceURL 103890 0.01 14 123 0 651 0

Variable type: logical

skim_variable n_missing complete_rate mean count
confirmedPatient 0 1 0.98 TRU: 103293, FAL: 1667
charterFlightPassenger 104946 0 1.00 TRU: 14
cruisePassengerDisembarked 104949 0 1.00 TRU: 11

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ageBracket 0 1 32.37 23.67 -1 20 30 50 100 ▅▇▅▂▁

 
元がJSON形式なので、読み込んだ直後は殆どの変量(フィーチャー)が文字型になっていることが分かる。また、意外と欠損が多いことも分かる。

 

Transform

各変量(フィーチャー)を適切な形式に変換し、地域区分でも分析できるように都道府県データと結合する。

x <- df %>% 
  dplyr::select(patientId, date = dateAnnounced, gender,
                pref = detectedPrefecture, patientStatus, knownCluster,
                confirmedPatient, charterFlightPassenger,
                cruisePassengerDisembarked, ageBracket,
                deceasedDate, deceasedReportedDate) %>% 
  dplyr::filter(confirmedPatient == TRUE) %>% 
  dplyr::mutate(date = lubridate::as_date(date),
                gender = forcats::as_factor(gender),
                patientStatus = forcats::as_factor(patientStatus),
                cluster = dplyr::if_else(!is.na(knownCluster), TRUE, FALSE),
                ageBracket = forcats::as_factor(ageBracket),
                deceasedDate = lubridate::as_date(deceasedDate),
                deceasedReportedDate = lubridate::as_date(deceasedReportedDate)) %>% 
  dplyr::left_join(prefs, by = c("pref" = "pref")) %>% 
  dplyr::select(-`推計人口`) %>% 
  dplyr::rename(Pref = `都道府県`, region = `八地方区分`)

x

変換結果を要約してみると

x %>% 
  skimr::skim()
Data summary
Name Piped data
Number of rows 103293
Number of columns 19
_______________________
Column type frequency:
character 3
Date 3
factor 9
logical 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
patientId 0 1.00 1 8 0 103293 0
pref 0 1.00 3 15 0 49 0
knownCluster 100840 0.02 3 88 0 227 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2020-01-15 2020-11-03 2020-08-13 280
deceasedDate 102914 0 2020-02-13 2020-10-17 2020-05-08 150
deceasedReportedDate 102963 0 2020-02-13 2020-10-17 2020-05-16 131

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
gender 14409 0.86 FALSE 2 M: 49916, F: 38968
patientStatus 100760 0.02 FALSE 8 Hos: 1261, Dec: 371, Hom: 315, Dis: 283
ageBracket 0 1.00 FALSE 13 20: 24465, 30: 15417, -1: 14508, 40: 12751
pcode 1207 0.99 FALSE 47 13: 31532, 27: 13106, 14: 8881, 23: 6411
Pref 1207 0.99 FALSE 47 東京都: 31532, 大阪府: 13106, 神奈川: 8881, 愛知県: 6411
region 1207 0.99 FALSE 8 関東地: 53651, 近畿地: 20594, 九州地: 10958, 中部地: 10027
広域圏 7963 0.92 FALSE 8 首都圏: 53869, 近畿圏: 20026, 中部圏: 8700, 九州圏: 7563
通俗的区分 1207 0.99 FALSE 11 関東: 53651, 関西: 20026, 東海: 8352, 九州: 7563
fct_pref 1207 0.99 FALSE 47 Tok: 31532, Osa: 13106, Kan: 8881, Aic: 6411

Variable type: logical

skim_variable n_missing complete_rate mean count
confirmedPatient 0 1 1.00 TRU: 103293
charterFlightPassenger 103279 0 1.00 TRU: 14
cruisePassengerDisembarked 103282 0 1.00 TRU: 11
cluster 0 1 0.02 FAL: 100840, TRU: 2453

 
文字型を因子型に変換するだけでも大まかな傾向が見えるようになる。例えば

  • 年齢別で見ると20代、30代、年齢不明(恐らく非回答)、40代の順に多い
  • 都道府県別では東京、大阪、神奈川、愛知の順と人口にほぼ比例
  • 地方区分で見ると関東、近畿、九州、中部となっており九州地方が以外と多い

ことが読める。

patientStatusは以下の通りで、ほぼ更新されていないのと思われる。死者数などの推移を見る場合はサマリデータを使った方がいい。

x %>% 
  dplyr::group_by(patientStatus) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(Japanese = c("回復", "入院中", "退院済", "死亡", "詳細不明",
                             "重症", "自宅療養", "ホテル療養", NA))

 

全国集計

 

都道府県別集計

都道府県別の総陽性者数と人口千人あたりの陽性者率を求める。

x %>% 
  dplyr::group_by(Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>% 
  dplyr::select(Pref, n, population = `推計人口`) %>% 
  dplyr::mutate(rate = round(n / population, 2))

上位10県を累計人数と人口千人あたりの陽性者数で比べてみる。

x %>% 
  dplyr::group_by(Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>% 
  dplyr::select(Pref, n, population = `推計人口`) %>% 
  dplyr::mutate(rate = round(n / population, 2)) %>% 
  dplyr::slice_max(order_by = n, n = 10) %>% 
  dplyr::rename(`累計陽性者数` = n, `推計人口[千人]` = population, `率` = rate)
x %>% 
  dplyr::group_by(Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>% 
  dplyr::select(Pref, n, population = `推計人口`) %>% 
  dplyr::mutate(rate = round(n / population, 2)) %>% 
  dplyr::slice_max(order_by = rate, n = 10) %>% 
  dplyr::rename(`累計陽性者数` = n, `推計人口[千人]` = population, `率` = rate)

累計の陽性者数は、ほぼ、人口に比例しているが、一部の県での感染率が高いことが分かる。

 

八地方区分別集計

地方区分で比較すると都道府県と同様に人口の多い関東、近畿はともかく、九州、北海道の陽性者率が高いことが分かる。

region <- prefs %>% 
  dplyr::group_by(`八地方区分`) %>% 
  dplyr::summarise(population = sum(`推計人口`)) %>% 
  dplyr::rename(region = `八地方区分`)

x %>% 
  dplyr::group_by(region) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(region, by = c("region" = "region")) %>% 
  dplyr::select(region, n, population) %>% 
  dplyr::mutate(rate = round(n / population, 2))

 

 

日次集計

日次の陽性者数、前日比、累計を求める。

x %>%
  dplyr::filter(!is.na(Pref)) %>% 
  dplyr::group_by(date) %>% 
  dplyr::summarise(n = n()) %>% 
  tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day"),
                  fill = list(n = 0L)) %>% 
  dplyr::mutate(diff = n - dplyr::lag(n, default = 0L), cum = cumsum(n))

 

都道府県別日次集計

x_prefs <- x %>% 
  dplyr::filter(!is.na(Pref)) %>% 
  dplyr::group_by(date, Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::ungroup() %>% 
  tidyr::pivot_wider(names_from = Pref, values_from = n, values_fill = 0L) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "Pref", values_to = "n")
x_prefs
x_prefs_diff <- x_prefs %>% 
  tidyr::pivot_wider(names_from = Pref, values_from = n, values_fill = 0L) %>% 
  dplyr::mutate_if(is.integer, .funs = lagdiff) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "Pref", values_to = "diff")
x_prefs_diff
x_prefs_cum <- x_prefs %>% 
  tidyr::pivot_wider(names_from = Pref, values_from = n, values_fill = 0L) %>% 
  dplyr::mutate_if(is.integer, .funs = cumsum) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "Pref", values_to = "cum")
x_prefs_cum
x_by_prefs <- x_prefs %>% 
  dplyr::left_join(x_prefs_diff, by = c("date" = "date", "Pref" = "Pref")) %>% 
  dplyr::left_join(x_prefs_cum, by = c("date" = "date", "Pref" = "Pref")) %>% 
  dplyr::left_join(prefs, ., by = c("都道府県" = "Pref")) %>% 
  dplyr::mutate(Pref = forcats::fct_inorder(`都道府県`)) %>% 
  dplyr::select(date, Pref, n, diff, cum) %>% 
  dplyr::arrange(date)
x_by_prefs

 

八地方区分別日次集計

x_region <- x %>% 
  dplyr::filter(!is.na(Pref)) %>% 
  dplyr::group_by(date, region) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::ungroup() %>% 
  tidyr::pivot_wider(names_from = region, values_from = n, values_fill = 0L) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "region", values_to = "n")
x_region
x_region_diff <- x_region %>% 
  tidyr::pivot_wider(names_from = region, values_from = n, values_fill = 0L) %>% 
  dplyr::mutate_if(is.integer, .funs = lagdiff) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "region", values_to = "diff")
x_region_diff
x_region_cum <- x_region %>% 
  tidyr::pivot_wider(names_from = region, values_from = n, values_fill = 0L) %>% 
  dplyr::mutate_if(is.integer, .funs = cumsum) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "region", values_to = "cum")
x_region_cum
x_by_region_ma <- x_region %>% 
  tidyr::pivot_wider(names_from = region, values_from = n, values_fill = 0L) %>% 
  dplyr::mutate_if(is.integer, .funs = ma7) %>% 
  tidyr::pivot_longer(cols = -date, names_to = "region", values_to = "ma7")
x_by_region_ma  
x_by_region <- x_region %>% 
  dplyr::left_join(x_region_diff, by = c("date" = "date", "region" = "region")) %>% 
  dplyr::left_join(x_region_cum, by = c("date" = "date", "region" = "region")) %>% 
  dplyr::left_join(x_by_region_ma, by = c("date" = "date", "region" = "region")) %>% 
  dplyr::mutate(region = forcats::fct_relevel(region,
                                              "北海道地方", "東北地方", "関東地方",
                                              "中部地方", "近畿地方", "中国地方",
                                              "四国地方", "九州地方")) %>%
  dplyr::select(date, region, n, diff, cum, ma7) %>% 
  dplyr::arrange(date)
x_by_region

Visualizing

全国の日次推移

sec_scale <- 100
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")

x %>% 
  dplyr::group_by(date) %>% 
  dplyr::filter(!is.na(Pref)) %>% 
  dplyr::summarise(n = n()) %>% 
  tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day"),
                  fill = list(n = 0L)) %>% 
  dplyr::mutate(diff = lagdiff(n), cum = cumsum(n),
                ma7 = zoo::rollmeanr(n, k = 7L, na.pad = TRUE)) %>%
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_bar(ggplot2::aes(y = n), stat = "identity", width = 1.0,
                      alpha = 0.5) + 
    ggplot2::geom_line(ggplot2::aes(y = ma7), linetype = "dotted", size = 0.5) + 
    ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale), colour = "dark green") +
    ggplot2::scale_y_continuous(
      name = "陽性者数(単日)・移動平均(点線)",
      sec.axis = ggplot2::sec_axis(~ . * sec_scale,
                                    name = "累積陽性者数(折線)")) + 
    ggplot2::labs(title = paste0("@", datetime), x = "")

x %>% 
  dplyr::group_by(date) %>% 
  dplyr::filter(!is.na(Pref)) %>% 
  dplyr::summarise(n = n()) %>% 
  tidyr::complete(date = seq.Date(from = min(date), to = max(date), by = "day"),
                  fill = list(n = 0L)) %>% 
  dplyr::mutate(diff = lagdiff(n), cum = cumsum(n)) %>%
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_line(ggplot2::aes(y = diff), colour = "dark green", alpha = 0.5) + 
    ggplot2::labs(title = paste0("@", datetime), x = "", y = "陽性者数前日差")

 

地方別の日次推移

陽性者数(単日)

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date, y = n)) + 
    ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
                      width = 1.0, alpha = 0.5) + 
    ggplot2::labs(title = paste0("単日 @", datetime),
                  x = "", y = "陽性者数[人]") 

 

陽性者数(単日)の移動平均

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date, y = ma7, colour = region)) + 
    ggplot2::geom_line(size = 1) +
    ggplot2::theme(legend.position = 'none') +
    ggplot2::labs(title = paste0("7日間移動平均 @", datetime),
                  x = "", y = "陽性者数[人]") + 
    ggrepel::geom_text_repel(ggplot2::aes(label = region),
                             data = subset(x_by_region, date == max(date)),
                             nudge_x = 30, segment.alpha = 0.5, size = 3) + 
    ggplot2::lims(x = c(min(x_by_region$date),
                        max(x_by_region$date) + 45))

 

陽性者数(累積)

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date, y = cum, colour = region)) + 
    ggplot2::geom_line() +
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("累積陽性者数@", datetime), x = "", y = "累積[人]") + 
    ggrepel::geom_text_repel(ggplot2::aes(label = region),
                             data = subset(x_by_region, date == max(date)))

 

陽性者数(単日)+累積陽性者数

sec_scale <- 50
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
                      alpha = 0.5, width = 1.0) + 
    ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
    ggplot2::facet_wrap(~ region) + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("Fixed scale @", datetime), x = "", y = "") + 
    ggplot2::scale_y_continuous(
      name = "陽性者数(単日)",
      sec.axis = ggplot2::sec_axis(~ . * sec_scale,
                                    name = "累積陽性者数(折線)")
    )

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_bar(ggplot2::aes(y = n, fill = region), stat = "identity",
                      alpha = 0.5, width = 1.0) + 
    ggplot2::geom_line(ggplot2::aes(y = ma7, colour = region),
                       linetype = "solid", size = 0.25) + 
    ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = region)) +
    ggplot2::facet_wrap(~ region, scales = "free_y") + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("Free Y scale @", datetime), x = "", y = "") + 
    ggplot2::scale_y_continuous(
      name = "陽性者数(単日)",
      sec.axis = ggplot2::sec_axis(~ . * sec_scale,
                                    name = "累積陽性者数(折線)")
    )

陽性者数(前日差)

x_by_region %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_line(ggplot2::aes(y = diff, colour = region)) +
    ggplot2::facet_wrap(~ region, scales = "free_y") + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("陽性者数前日差, Free Y scale @", datetime),
                  x = "", y = "")

 

都道府県別日次推移

sec_scale <- 100
ncol <- 5
datetime <- lubridate::as_datetime(df_s$updated, tz = "Japan")


x_by_prefs %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
                      alpha = 0.25, width = 1.0) + 
    ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
    ggplot2::facet_wrap(~ Pref, ncol = ncol) + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("Fixed scale @", datetime), x = "", y = "") + 
    ggplot2::scale_y_continuous(
      name = "陽性者数(単日)",
      sec.axis = ggplot2::sec_axis(~ . * sec_scale,
                                    name = "累積陽性者数(折線)")
    )

x_by_prefs %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
  ggplot2::geom_bar(ggplot2::aes(y = n, fill = Pref), stat = "identity",
                      alpha = 0.25, width = 1.0) + 
    ggplot2::geom_line(ggplot2::aes(y = cum / sec_scale, colour = Pref)) +
    ggplot2::facet_wrap(~ Pref, ncol = ncol, scales = "free_y") + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("Free Y scale @", datetime), x = "", y = "") + 
    ggplot2::scale_y_continuous(
      name = "陽性者数(単日)",
      sec.axis = ggplot2::sec_axis(~ . * sec_scale,
                                    name = "累積陽性者数(折線)")
    )

x_by_prefs %>% 
  ggplot2::ggplot(ggplot2::aes(x = date)) + 
    ggplot2::geom_line(ggplot2::aes(y = diff, colour = Pref)) +
    ggplot2::facet_wrap(~ Pref, ncol = ncol, scales = "free_y") + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = paste0("陽性者数前日差, Free Y scale @", datetime),
                  x = "", y = "")

都道府県別

x %>% 
  dplyr::group_by(Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>% 
  dplyr::select(Pref, n, population = `推計人口`) %>% 
  dplyr::mutate(rate = round(n / population, 2)) %>% 
  ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) + 
    ggplot2::geom_point(ggplot2::aes(colour = Pref)) + 
    ggrepel::geom_text_repel(ggplot2::aes(label = Pref, colour = Pref)) + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = "", x = "推計人口[千人]", y = "累計陽性者数")

x %>% 
  dplyr::group_by(Pref) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(prefs, by = c("Pref" = "都道府県")) %>% 
  dplyr::select(Pref, n, population = `推計人口`) %>% 
  dplyr::mutate(rate = round(n / population, 2)) %>% 
  dplyr::filter(n < 1000) %>% 
  # dplyr::slice_min(order_by = n, n = 38) %>% 
  ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) + 
    ggplot2::geom_point(ggplot2::aes(colour = Pref)) + 
    ggrepel::geom_text_repel(ggplot2::aes(label = Pref, colour = Pref)) + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = "累計陽性者数千人未満", x = "推計人口[千人]", y = "累計陽性者数")

地方区分別

region <- prefs %>% 
  dplyr::group_by(`八地方区分`) %>% 
  dplyr::summarise(population = sum(`推計人口`)) %>% 
  dplyr::rename(region = `八地方区分`)

x %>% 
  dplyr::group_by(region) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::left_join(region, by = c("region" = "region")) %>% 
  dplyr::select(region, n, population) %>% 
  dplyr::mutate(rate = round(n / population, 2)) %>% 
  ggplot2::ggplot(ggplot2::aes(x = population, y = n) ) + 
    ggplot2::geom_point(ggplot2::aes(colour = region)) + 
    ggrepel::geom_text_repel(ggplot2::aes(label = region, colour = region)) + 
    ggplot2::theme(legend.position = 'none') + 
    ggplot2::labs(title = "", x = "推計人口[千人]", y = "累計陽性者数")